The temporal dynamics of emotion comparison depends on low-level attentional factors

Humans are predisposed to attend to emotions conveyed by facial expressions. However, compulsory attraction to emotions gets challenging when multiple emotional stimuli compete for attention, as in the emotion comparison task. In this task, participants are asked to choose which of two simultaneously presented faces displays the most positive (happiest) or negative (angriest) emotion. Participants usually respond faster to the face displaying the most intense emotion. This effect is stronger for face pairs that contain globally positive rather than negative emotional faces. Both effects are consistent with an attentional capture phenomenon driven by the perceptual salience of facial expressions. In the present experiment, we studied the temporal dynamics of attentional capture in the emotion comparison task by tracking participants’ eye movements using gaze-contingent displays and responses. Our results show that, on the first fixation, participants were more accurate and dwelled longer on the left target face when it displayed the most intense emotion within the pair. On the second fixation, the pattern was reversed, with higher accuracy and longer gaze time on the right target face. Overall, our pattern of gazing behavior indicates that the typical results observed in the emotion comparison task arise from the optimal combination over time of two low-level attentional factors: the perceptual salience of emotional stimuli and the scanning habit of participants.


Aim and expectations
Here, in order to unveil the role of low-level attentional factors in driving efficient comparative judgments in an emotion comparison task, we measured participants' attentional shifts during different types of gazing tasks (gaze towards the happiest or angriest face within a pair). These tasks were conducted using the same horizontally oriented half emotional pairs used by Fantoni et al. 18 . These emotional pairs included real faces arranged side-by-side with opposite global valence but equal relative emotion intensity. Specifically, there were globally positive pairs, with an intermediate and happy facial expression (namely a 100|0 pair or 0|100), and. globally negative pairs, with an intermediate and an angry facial expression (namely a − 100|0 pair or 0|− 100). In contrast to previous emotion comparison tasks that measured speed and accuracy of choice using bimanual responses, we used gaze as a probe of choice accuracy. Tracking the eyes during the comparison task allowed us to measure the accuracy of the choice of the target face in terms of fixation accuracy (e.g. the probability of casting the gaze on the region of interest covering the target face) and dwell time (e.g. the total amount of time the gaze is cast on the region of interest covering the target face) depending on the specific gazing instruction. Both indices of gaze-contingent choice behavior were treated as windows on low-level attentional processes.
In particular, our indices of gaze-contingent choice behavior are relevant in addressing an unresolved question from Fantoni et al. 's 18,21 study. This question regards the different funnel patterns in spatially congruent and incongruent conditions. Authors found that selecting faces in spatially congruent emotional pairs (namely pairs displaying the happy face to the right and the neutral to the left or the angry face to the left and the neutral to the right) led to a reduction of the funnel pattern. The reduction of the funnel pattern was found to be due to a smaller happiness advantage compared to selecting faces in spatially incongruent emotional pairs (namely pairs displaying the happy face to the left and the neutral to the right or the angry face to the right and the neutral to www.nature.com/scientificreports/ the left). Throughout the paper, we refer to the difference between the funnel pattern in spatially congruent and incongruent conditions as the spatial congruity anisotropy. Importantly, in Experiment 2 of Fantoni et al. 18 the spatial congruity anisotropy was found to be particularly evident when emotional pairs were displayed tachistoscopically rather than until participant response. The reason for such a difference remained unsolved, although authors hypothesized that such a difference resulted from an endogenous factor of attention modulating PAF [41][42][43][44][45][46][47][48][49][50] . This endogenous factor of attention might be driven by the scanning habit, that prioritizes stimuli either on the left or on the right visual hemifield depending on the temporal phase of the scanning of the visual scene. We will refer to this second attentional factor as the Automatic-Attentional Factor (AAF). Previous research has shown that initial fixations toward faces are biased toward the left side of the face, followed by a rightward fixation 43,48 . Whether this scanning habit bias is related to innate or cultural factors, like the reading direction, is still a matter of debate.
It is possible that the tachistoscopic view of a lateralized pair of emotions forced the attentional system to focus on the first (e.g. the left) hemifield rather than on the second (e.g. right) hemifield as an effect of a leftward bias produced by AAF. It was likely that the direction of such a bias was supported by cultural factors being the sample of participants tested by Fantoni et al. (Experiment 2) 18 selected from western culture and exposed to leftto-right reading/writing direction. This would explain the overall speeding up/larger accuracy of the selection of left-emotional rather than right-emotional targets found by Fantoni et al. (Experiment 2) 18 . In congruent spatial conditions, participants selected the angriest face (on the left) of an emotional pair faster and more accurately than the happiest face (on the right). Conversely, in incongruent spatial conditions, participants selected the happiest (on the left) face of the emotional pair faster and more accurately rather than the angriest (on the right) face.
In the present study, we hypothesized and tested whether the spatial congruity anisotropy is due to the combination, over exploration time of a face pair, of PAF (predicting a funnel pattern which is similar over spatial congruency conditions) and AAF (predicting a left or right hemifield advantage depending on the spatial congruency condition over exploration time). The actual involvement of these factors in comparison tasks has never been directly demonstrated, despite previous studies indicated that the combination of these factors may result in the funnel pattern of emotion comparison tasks 18,19,21 .
A revised version of SIA that integrates PAF and AAF (integrated SIA) can predict the different funnel patterns over spatial congruency conditions (i.e. the spatial congruity anisotropy) as follows. If, according to AAF, fixations are more likely to occur on the left visual hemifield rather than on the right, the intensity of targets on the left will be weighted more heavily compared to the intensity of targets on the right, producing a left-hemifield advantage. This left-hemifield advantage modifies the funnel pattern predicted by PAF alone as in the standard SIA that will be differently shaped depending on the spatial congruency conditions. In particular, the lines joining the right or left fixation accuracy over the Average Emotion Intensity of the pair are predicted to cross over at different points depending on spatial congruency. The crosspoint in the average emotion intensity continuum will be shifted towards negative values for spatially incongruent pairs, and towards positive values for spatially congruent pairs. According to the categorization of emotion comparison effects proposed by Fantoni et al. 18 , a shift in the negative direction produces a happiness advantage. This type of advantage involves a larger difference between fixation accuracy for fully and intermediate emotional faces in globally positive rather than globally negative emotional pairs. Notably, the exact opposite unbalance between funnel patterns over spatial congruency conditions is expected to occur in the case of a right-hemifield advantage, when fixations are more likely to occur on the right rather than on the left visual hemifield.
In the present study, to test the validity of the integrated SIA model, we conducted an experiment on a sample similar to Fantoni et al. 18 (Italian native speakers with left-to-right reading/writing direction), using a gazing instead of bimanual response. We analyzed the global fixation accuracy, dwell time and saccade latency and investigated the temporal dynamics of gazing performance. We analyzed eye-movements at the beginning and at the end of the gazing task, expecting different left and right hemifield advantages (produced by the scanning habit) in each temporal period.

Experiment
How does the deployment of attention, reflected by eye-movement patterns during a gaze-contingent emotion comparison task, differ over time depending on the engagement of PAF versus AAF?
In order to answer this question, we focused on three attentional indices of gaze-contingent choice behavior: the fixation accuracy, dwell times, and the saccade latency. These indices were extracted from the individuals' flow of eye movements performed during an emotion comparison task. They were measured among 8 types of half-range emotional pairs, resulting from the factorial combination of 2 Target Position (Left vs. Right) × 2 Spatial Congruency with the mental spatial representation of valence (Congruent if the happiest face is displayed on the right hemifield, and Incongruent if the happiest face is displayed on the left hemifield) × Average Emotion Intensity (50 for happy|intermediate and − 50 for angry|intermediate pairs).
The assumption of our paradigm is that attentional indices of gaze choice mirror response accuracy and speed in the bimanual version of standard emotion comparison tasks. SIA predictions 11 and its integrated version (see subsection "Aim and expectations") should thus account for our pattern of eye movements as well. As a consequence, we expect the pattern of attentional indices to be characterized by the following effects: (1) A crossover resulting from a 3-way Average Emotion Intensity × Target Position × Spatial Congruency interaction. This interaction is diagnostic of an effect of PAF alone, with the lines connecting fixation www.nature.com/scientificreports/ accuracy/dwell time for the left and right targets crossing when plotted against average emotion intensity in both congruent and incongruent spatial positions; (2) A main effect of the Average Emotion Intensity. This effect is diagnostic of a size effect, which involves a global shearing of the crossover pattern in (1) with the fixation accuracy/dwell time for globally negative pairs being smaller than the fixation accuracy for globally positive pairs; (3) A 2-ways Spatial Congruency × Target Position interaction. This interaction is diagnostic of an emotion anisotropy compatible with either a happiness (with a fixation accuracy/dwell time advantage for happy target faces in right congruent or left incongruent spatial position better than angry faces in right congruent or left incongruent spatial position) or an anger advantage (with a fixation accuracy/dwell time advantage for angry target faces in right congruent or left incongruent spatial position better than happy faces in right congruent or left incongruent spatial position).
If the emotion anisotropy in (3) is present, then the pattern of choice in (1) is turned into a funnel-shaped pattern. This type of pattern is diagnosed by a 3-way interaction with a shift of the cross point between the lines connecting the fixation accuracy/dwell time for left vs. right targets towards negative (in the case of a happiness advantage) or positive (in the case of an anger advantage) values of average emotion intensity. Furthermore the 2-ways interaction in 3) can be nulled in the presence of a significant spatial congruity anisotropy. If the funnel pattern in congruent and incongruent spatial positions are similar in magnitude but opposite in direction (i.e. with one funnel pattern compatible with a happiness and the other with an anger advantage), than the 2-way interaction will indeed be nullified. This result is expected if AAF (beyond PAF) is at work.
In order to test how these expected set of effects changes over gazing time, we capitalized on a novel approach introduced by Schurgin et al. 48 . In their experiment Schurgin et al. 48 , focused on a limited set of informative fixations to explore the time course of fixation accuracy and dwell time during the exploration of different types of facial expressions. They showed that the first and second fixations are the most variable across time and conditions. Authors thus concluded that the first and second fixations are the most informative in terms of fixation accuracy/dwell time. This is also consistent with results by Caspi et al. 51 (but see also [52][53][54] showing that most of the information that guides the eye movement in a visual search task is accumulated within the first 200 ms of the display presentation. Additionally, the first fixations and saccades are typically dependent on perceptual salience, which is either value-driven or stimulus-driven. According to these findings, we focused our analysis on the temporal dynamics of gazing measuring the first and the second fixation accuracy and dwell time (the selection of these two fixations resulted to be the most informative as corroborated also by the results of the preliminary analysis in the "Data analysis" subsection).
We conducted a sensitivity analysis with G-Power 3.1 56 on our sample size with α err. Prob. = 0.05, Power (1-β err. Prob.) = 0.90 to establish the Minimal Detectable Effects resulting from our experimental design. These resulted to be in the medium-to-large range with a critical F = 2.02 and a η p 2 = 0.10.

Material.
Face pairs were presented on a 23.6-inch. ViewSonic color monitor (1920, 1080, 60 Hz) and generated with a custom-made program written in Experiment Builder (SR-Research, Ontario, Canada) running on a Dell Precision T3500 machine (Windows 7 Ultimate). From the participant's point of view (58 cm far from the monitor), each face measured 6.92° height and 5.46° wide. The eccentricity of each face within a pair (noseto-medial axis of the screen) was 10.92°. According to the results of Bayle et al. 57 , the eccentricity of our stimuli guaranteed a lateralized encoding of each face and was within the limits of the human ability to detect facial expressions. This is also consistent with results by Fantoni et al. 18,19 showing that the exact same pairs of facial expressions displayed at similar eccentricity led to accurate emotion comparison performance in a bimanual task. We used the same colored-photographs of human facial expressions used by Fantoni et al. 18,19 . The photographs of faces were taken from 8 Caucasian Characters (4 female and 4 male) selected from the Radboud University Nijmegen set 58 (Character numbers: 1,2,4,19,20,30,46, and 71). The hair and ears of the faces were masked by a black oval vignette 18  www.nature.com/scientificreports/ Eye movements were recorded with an EyeLink 1000 Desktop Mount system (sampling rate: 1000 Hz; Average gaze position accuracy of 0.15°. SR Research, Ontario, Canada) in a head-free tracking setting. The recording was from the left eye, though viewing was binocular. A standard 13-point calibration phase was performed before the beginning of each experimental block.
Experimental design. The experimental design included 64 trials per block. The trials resulted from the full factorial combination of 8 Types of Character × 2 Target Position × 2 Spatial Congruency × 2 Average Emotion Intensity. Within each block, participants performed the same task but following different instructions: (A) "gaze towards the angriest face within the pair" or (B) "gaze towards the happiest face within the pair". The ordering of the instruction block was fully counterbalanced across participants with half of the participants performing the experiment following the A-B and half the B-A order.
Procedure. The procedure was similar to the one adopted by Fantoni et al. 18,19 , but with the aid of gazecontingent control of choice behavior. Participants were asked to show up at the lab with no eye makeup to guarantee a precise and accurate parsing of gazing by the Eye Tracker. On their arrival, participants underwent the Edinburgh Handedness Inventory and received general oral instructions about the experiment. They sat in a dimly illuminated, acoustically isolated room. The experiment was introduced by on-screen instructions ("gaze towards the angriest/happiest, " depending on instruction order). On-screen instructions informed participants that they had to choose among a pair of horizontally aligned faces which of the two appear to be the angriest/ happiest, moving their gaze on the target face. They were also instructed to control the advancement of the trial with the spacebar while keeping their position relative to the screen stable during the experiment. After carefully setting the participant hand and body position, the experimental block began with the eye-tracking calibration phase. This calibration phase was repeated, as needed, throughout the experiment. After eye calibration, the trial started with a gaze-contingent fixation phase in which a red cross appeared at the center of the monitor. If the gaze was outside the circular fixation region (radius = 1°) surrounding the cross for more than 15 s, the task was momentarily interrupted, and the participant enters again the eye calibration phase. If the gaze was inside the fixation region, the red fixation cross then turned into a green cross, indicating that the participant could proceed to the response phase by pressing the space bar on the keyboard. After the space bar press, the fixation cross turned white and remained on view for a variable [800-1300 ms] time interval. This variable time interval is used to prevent any anticipatory saccadic movement 59 . The fixation cross was then replaced by the face pair display. The observer was required to cast the gaze on either the happiest or angriest face within the pair (depending on the Instruction condition). The stimulus remained on screen for 3000 ms after the first fixation was detected by the eye-tracker over one of the two regions of interest surrounding each face (ROIs, 7.92° height and 6.46° wide). This exploration phase was intended to provide the participant enough time to freely explore the face pair or to correct the first gaze choice behavior. After this phase, the stimulus was replaced by a 3000 ms blank refresh screen. Data analysis. The EyeLink online parser uses a "saccade picking" approach, based on instantaneous velocity and acceleration thresholds to determine the onset and offset of saccades. Samples above the thresholds are determined to be in saccade, and samples below the thresholds are determined to be in fixation. The parser setting uses velocity and acceleration thresholds of 30°/s and 8000°/s 2 , respectively. So, the onset of a fixation is determined by the offset of the previous saccade, and the offset of a fixation is determined by the onset of the subsequent saccade. Fixations cast within or between each oval region of interest matching each face of the pair were detected as long as there was no saccade in progress. We analyzed data from 9254 individual fixations. Each individual fixation was encoded in terms of its dwell time (ms), its fixation accuracy (1 = the fixation is cast on the oval region of interest surrounding the target face; 0 = the fixation is cast on oval region of interest surrounding the face that was not the target), and its order (whether the 1st, 2nd and so on). Descriptive statistics of fixations is reported in the Supplementary Materials (subsections 1.1, 2.1 and 3.1).
Following Schurgin et al. 48 (but see also Caspi et al. 51 ) in a preliminary analysis of fixation accuracy, we determined the number of informative fixations in our task in order to restrict the successive analyses to informative fixations and saccades only. The analysis showed that fixation accuracy reached a plateau (90%, SD = 0.29) on the third fixation. The performance was fully stabilized across time and conditions on the fourth fixation (F(1, 3376) = 3.83, p = 0.051). The fixation accuracy of the third fixation was at ceiling over the experimental conditions revealing a null 3-way Average Emotion Intensity × Target Position × Spatial Congruency interaction. Hence, in the subsequent part of the paper we focused the analysis on the most informative fixations and saccades only: the first and the second 48 for a total of 4133 fixations and their associated saccades. From the individual saccades associated with these fixations, we extracted the latency and type. The type of saccades was obtained by encoding them as either corrective, when occurring between-faces, or confirmatory, when occurring within the same face. Descriptive statistics of saccades are reported in the Supplementary Materials (subsection 3.1). As a further measure of gazing behavior, we analyzed dwell times (see the subsection 2.1 of the Supplementary Material for descriptive statistics).
Following Barr et al. 60 , we performed our analyses using generalized linear mixed models with the maximal random effect's structure justified by our experimental design: These consisted in a MAXglmer (with probit as the link function for fixation accuracy) and a MAXlmer (for dwell time and saccade latencies), with a by-subject slope and intercept for each condition resulting by our Average Emotion Intensity × Target  www.nature.com/scientificreports/ the conditions involved in our interactions. To examine the time course of gazing behavior, we conducted an analysis including the Fixation count (1 for first, 2 for second) as a further factor in the Average Emotion Intensity × Target Position × Spatial Congruency MAXglmer/ MAXlmer model. The analysis of dwell times can be found in the Supplementary Materials of the manuscript (see Section 2). The decision to exclude it from the main text was due to considerations of brevity, and because the analysis produced results that were similar to those of fixation accuracy. For similar reasons, the analysis of saccade latencies was not reported in the manuscript in its extended form.A preliminary analysis revealed that the distributions of individual first and second saccade latencies were fully accounted for by their associated fixation accuracies. This was demonstrated by a MAXlmer (r c = 0.31, 95% CI [0.29, 0.32], see Table 9 Supplementary Materials) including the fixation accuracy as an additional covariate of saccade latency beyond each condition resulting from our Average Emotion Intensity × Target Position × Spatial Congruency design (F(1, 31.88) = 2.04, p < 0.001). Fixation accuracy nulled the Target Position × Spatial Congruency × Average Emotion Intensity interaction (F(1, 153.23) = 0.014, p = 0.903) showing that there was no evidence that the pattern of saccades latency was accounted for by any factor beyond fixation accuracy. Overall, there was a positive relationship between saccade latency and fixation accuracy, with longer saccades associated with larger likelihood of fixation accuracy (β = 138.25, s.e. = 43.72, t(72.10) = 3.16, p = 0.002).
As a final analysis, we tested the predictive power of the integrated SIA model on fixation accuracy and dwell times. In doing that, we followed the procedure proposed by Fantoni et al. 18 . We predicted the pattern of fixation accuracy/dwell time with the linear combination of emotion intensity values that models the PAF and produce the crossover, the size effect and the emotion anisotropy effect typically observed in emotion comparison. Furthermore, we integrated the SIA adding the AAF based on the scanning habit component as an additional parameter. In particular, the AAF was modelled as a weight modulating the emotion anisotropy component of standard SIA depending on the Spatial Congruency conditions. In order to do that, individuals' fixations accuracy and dwell times were used to first extract 2 individual synthetic indexes of emotion anisotropy, one for each condition resulting from the combination of 2 Target Position × 2 Spatial Congruency conditions. Each individual index of emotion anisotropy was calculated by subtracting the individual value of the best fitting MAXglmer (on fixation accuracy)/MAXlmer (on dwell time) regressor's intercept for the selection of a target on the Left vs Right Target Position in Congruent and Incongruent Spatial Positions over Average Emotion Intensity. These regressors correspond to the line connecting the fixation accuracy/dwell time for a target on the Left (i.e. the angriest|emotional face and the angriest|intermediate face in spatially congruent pairs, and the happiest|intermediate face and the happiest|emotional face in spatially incongruent pairs) and the line connecting the fixation accuracy/dwell time for a target on the Right (i.e. the happiest|intermediate face and the happiest|emotional face in spatially congruent pairs, and the angriest|intermediate face and the angriest|emotional face in spatially incongruent pairs). When applied to each Congruency condition, this procedure leads to the identification of two values of emotion anisotropy: one for the congruent (k cong ) and one for the incongruent (k incong ) condition. In both cases, these indices quantify how much an individual fixation is biased by the selection of the most positive vs. negative emotion within the pair, with positive values indicating a shift of the Crosspoint between the Left vs. Right lines towards negative values of Average Emotion Intensity. This is consistent with a funnel pattern, producing a happiness advantage (i.e., an unbalance in favor of the selection of the most positive face when the average emotion intensity is null). Negative values of the k index, instead, indicate that the Crosspoint of the two lines is biased towards positive values of Average Emotion Intensity. This bias is consistent with a funnel involving an anger advantage. In our analysis, the difference between individuals' k cong and k incong is revealed through Welch t-Test and interpreted as diagnostic of the spatial congruity anisotropy.
In addition, we calculated the individual values diagnostic of AAF modelling the scanning habit (c) and integrated such a parameter into the SIA. In order to do that, we considered that k cong and k incong resulted from a linear combination of the global emotion anisotropy (K, as a synthetic value of the emotion anisotropy in congruent and incongruent conditions) and c, representing the effect of scanning habit on k cong and k incong , (with c exerting opposite effects on k cong and k incong ) as expected by AAF. Based on this line of reasoning, we obtained K and c by solving the following system of equations for each individual value of k cong and k incong : From the previous system of equations, we computed individuals' values of c as follows: According to Eqs. (1) and (2), a negative value of c represents a left-hemifield advantage with k incong > k cong . When c is negative k cong is biased towards negative values in the direction of an anger advantage, relative to a condition in which the spatial congruity anisotropy is null (i.e. with c = 0 and K = k incong = k cong ). The opposite occurs to k incong, which is biased towards positive values in the direction of a happiness advantage. A positive value of c represents a right-hemifield advantage, as it results from a k cong which is larger than k incong . The way the parameter c varies over time captures the individual's scanning habit, without making any prior assumptions about the direction of that habit (being it left-to-right or right-to-left).
As a synthetic measure of the balance between PAF and AAF, we analyzed individuals' Michelson contrast (M c ) of c-K with M c ranging from − 1 to + 1. Negative values indicate an unbalance in favor of PAF over AAF, while positive values indicate the opposite. In particular, we characterized every individual pattern of spatial congruity anisotropy between congruent and incongruent conditions as a specific combination of PAF (predicting (1) www.nature.com/scientificreports/ a fully symmetric crossover with c = 0) and AAF (predicting an overall advantage of the performance when the target is in one hemifield rather than the other with c ≠ 0). We tested the unbalance between the PAF and AAF by means of one sample t-Test of individual M c vs. 0. We quantitatively tested the accuracy of our SIA-based predictions by remapping the entire set of individual values associated with our experimental factors and applying the linear combination of intensity components predicted by the integrated SIA. This meant recoding each individual target face value within a pair in terms of the sum between the following factors: (1) The Absolute Emotional Intensity, which was given a value100 for angry and happy, and 0 for intermediate faces; (2) A per-subject weighed empirically determined value (α) modeling the size effect. This value was extracted following Fantoni et al. 18 as the best individual fitting multiplying factor of the Average Emotion Intensity (+ 50 for globally positive pairs and − 50 for globally negative pairs); (3) A per-subject empirically determined value of emotion anisotropy (K); (4) A per-subject empirically determined value of scanning habit (c).
To test whether SIA remapped values predicted the pattern of fixation accuracy/dwell time, we performed a MAXglmer/MAXlmer analysis testing the effects of Spatial Congruency and Target Position, including the SIA remapped values as a mediator. We also provided the R 2 for the fitting between SIA based predicted and observed values.

Results
Fixation accuracy. Figure 1 depicts participants' fixation accuracy pooled across the first two fixations. The pattern of data closely matches to the pattern of motor reactivity and accuracy found by Fantoni et al. 18 in the standard bimanual version of the emotion comparison task.
In Fig. 1, the distribution of fixations accuracy is characterized by a funnel pattern with best fitting glmer regression lines connecting fixation accuracy for left (grey lines) and right (yellow lines) targets crossing over in a true interaction when plotted against Average Emotion Intensity. This occurs in both congruent (panel a) and incongruent (panel c) spatial positions. However, the pattern of fixation accuracy reveals a rather evident spatial congruity anisotropy with the funnel pattern in spatially congruent conditions compatible with an anger advantage (glmer regressors crossing for negative values of average emotion intensity) and the funnel pattern in spatially incongruent conditions compatible with a happiness advantage (glmer regressors crossing for negative values of average emotion intensity). Despite this difference, the funnel pattern observed in both congruent and incongruent conditions can be fully explained by the values of individuals' fixation accuracy, which were remapped according to the most parsimonious linear combination of intensity components predicted by an integrated SIA (Fig. 1, panel b: empirically determined free parameters: K = 15.85, c = − 32.86, and α = 1.1). The fact that the optimal fitting values of K and c resulted to be non-null suggests that both PAF and AAF are engaged in the shaping of the pattern of fixation accuracy producing a funnel shape, with the non-null value of c standing for a spatial congruity anisotropy. . According to our hypothesis, this unbalance indicated a spatial congruity anisotropy, which was further corroborated by the analysis of the crosspoint between best fitting glmer in congruent k cong and incongruent spatial position k incong . In spatially congruent pairs, the k cong was negative as consistent with a funnel pattern compatible with an anger advantage (k cong = − 16.  www.nature.com/scientificreports/ the First and Second fixation separately. This analysis is meant to validate the expectation that AAF regulates the gazing behavior over time producing an opposite pattern of spatial congruity anisotropy in the First and Second fixations. The pattern of data depicted in Fig. 2 corroborated such an expectation. Figure 2 shows the distribution of the First (panels a-c) and Second (panels d-f) fixation accuracy as a function of Average Emotion Intensity for Spatially Congruent (panel a-d) and Incongruent pairs (panel c-f). The same distribution of fixation accuracy was plotted as a function of the integrated SIA remapped values in panels (b) and (e). Notably, the funnel pattern observed in the First fixation (Fig. 2, panels a-c) is different from the one observed in the Second fixation (Fig. 2, panels d-f). The accuracy of the first fixation is characterized by a funnel pattern compatible with an anger advantage in spatially congruent conditions and a happiness advantage in spatially incongruent conditions. The funnel patterns in congruent and incongruent conditions are reversed in the second fixation, being compatible www.nature.com/scientificreports/ with a happiness advantage in spatially congruent conditions and an anger advantage in spatially incongruent conditions. We corroborated these observations with a MAXglmer analysis on individuals' fixation accuracy. This analysis included all factors in our experimental design, as well as the temporal order of the fixations encoded as a dichotomous covariate (Fixation Count = 1 for the First and 2 for the Second fixation) (r c = 0.50, 95% CI [0.49, 0.52], see Table 2 Supplementary Materials). The results are consistent with an integrated SIA model with the values of the free parameters associated with PAF (K) and AAF (c) dependent on the fixation ordering according to a scanning of the visual scene from left-to-right. This was corroborated by a significant Target Position × Spatial Congruency × Fixation Count interaction, F(1, 4110) = 10.00, p = 0.001, η p 2 = 0.312, 95% CI [0.037, 0.539]. This interaction was due to the different funnel pattern observed in the first and the second fixation. To better characterize such a difference, we ran separate analyses for the first and second fixation.
The MAXglmer model on first fixation accuracy (r c = 0.52, 95% CI [0.49, 0.54], see Table 3 Supplementary Materials) revealed: (1) A general left-hemifield advantage, which was supported by a larger fixation accuracy for a target face presented on the left rather than on the right hemifield (mean for a target on the left = 0. The spatial congruity anisotropy is further supported by our post-hoc analyses that revealed a larger first fixations accuracy for emotional rather than intermediate targets, asymmetrically distributed over Spatial Congruency × Average Emotion Intensity conditions. In spatially congruent conditions, the anisotropy resulted in a funnel pattern compatible with an anger advantage (k cong = − 52. The accuracy of the second fixation was analyzed following the exact same rationale as the one used for the first fixation (Fig. 2d-f). The MAXglmer model revealed an opposite pattern than the one observed on first fixation accuracy (r c = 0.30, 95% CI [0.27, 0.32], see Table 4 Supplementary Materials). On the second fixation, participants were more accurate in detecting a target face displayed on the right rather than on the left hemifield

Discussion
We reported an experiment that examined the link between pairs of emotional faces shown side-by-side, and the eye-movement dynamics involved in gazing as a probe of choice behavior. We proposed and tested whether the prioritization of attention from the perceptual salience of facial expressions of emotion (PAF) and the dependence of attention from the scanning habit (AAF, prioritizing faces lying on the left or right visual hemifield depending on the gazing temporal phase) could account for comparative judgments of emotions selected by gazing. In particular, the results show that low-level attentional factors alone can account for gaze-contingent comparative choice in emotional pairs, independent of high-level knowledge-based factors that are typically invoked to account for results in manual simultaneous comparison tasks [8][9][10][11][12][13]29,49,50,61 .
Inspired by Schurgin et al. 48 (but see also [52][53][54] ), we analyzed the informative part of gazing behavior. We analyzed the pattern of fixation accuracy and dwell time separately, by first pooling the first two fixations and then analyzing them. Fixation accuracy, as well as dwell time and saccade latencies, produce a pattern of results which is similar to the one observed by Fantoni et al. 18 in bimanual simultaneous comparison tasks of facial expressions of emotion. Fixation accuracy and dwell times, similar to the speed of manual choice between the happiest or the angriest face within a pair, were found to be larger for facial expressions displaying the most intense emotion within the pair. Specifically, we observed that fixation accuracy and dwell times increased as the absolute emotion intensity of the selected face increased, as well as the average emotion intensity of the pair. This corresponded to a funnel pattern in the domain of gazing.
We interpreted this pattern as consistent with PAF, the perceptual salience of emotional stimuli causes emotional faces to capture more attention than neutral faces, regardless of their emotional valence (being it negative as with angry faces or positive as with happy faces). This is consistent with a 3-way interaction between Average Emotion Intensity, Target Position and Spatial Congruency, with the lines connecting fixation accuracy and dwell time for the Left and Right targets crossing over when plotted against Average Emotion Intensity. The cross point of such a funnel pattern was compatible with a facilitation for the positive over negative emotions (i.e. a happiness advantage) in spatially incongruent conditions and a facilitation for the negative over positive emotions (i.e. an anger advantage) in spatially congruent conditions. We named this difference between congruent and incongruent conditions as the spatial congruity anisotropy. This anisotropy is consistent with a modulation of PAF by AAF. AAF produces a left hemifield advantage with a larger accuracy of fixation for targets displayed to the left rather than the right.
The analysis of the pattern of gaze-contingent choice over time (first and second fixation) revealed a similarity between the specific pattern of spatial congruity anisotropy found on the first fixation and that on the second fixation. However, this similarity was reversed, with an anger advantage in spatial incongruent conditions and a happiness advantage in spatial congruent conditions on the first fixation and a happiness advantage in spatial incongruent conditions and an anger advantage in spatial congruent conditions on the second fixation.
We hypothesized and found that the different patterns of spatial congruity anisotropy, found in the present experiment as well as in Fantoni et al. 18 , are accounted for by an integrated SIA model. The integrated SIA model formalizes a general interpretative framework for the origin of the funnel pattern in emotion comparison tasks based on purely low-level factors like PAF and AAF. The integrated SIA model includes AAF driven by the scanning habit as an additional free parameter to PAF that was already modelled by the standard SIA 18,19 . In particular, the pattern of data at the beginning of the informative part of gazing behavior (i.e. the first fixation) is consistent with an integrated SIA model, which produces an overall facilitation for targets displayed on the left visual hemifield rather than the right. Additionally, this pattern is also consistent when a display is tachistoscopically www.nature.com/scientificreports/ presented 18 . This effect combined with the crossover patter predicted by PAF alone results in a funnel pattern compatible with an anger advantage in spatially congruent conditions and a happiness advantage in spatially incongruent conditions. In contrast, the fixation accuracy pattern observed at the end of the informative part of gazing behavior, namely the second fixation, can be explained by an integrated SIA model, which results in a hindering effect for targets displayed on the left visual hemifield compared to the right hemifield. This is consistent with a funnel pattern compatible with a happiness advantage in spatially congruent conditions vs. an anger advantage in spatially incongruent conditions. The integrated SIA model fully accounts for both the patterns of fixation accuracy over time (first and second fixation separately) and for their pooled pattern. This was shown by the high predictive power of the remapping of emotional pairs in terms of the simple sum between Target Absolute Emotion Intensity + Average Emotion Intensity × size effect + Emotion anisotropy + Left-to-Right scanning habit.
The optimal values of AAF to account for our pattern of data over the first and second fixation are consistent with a scanning habit developing over time from left to right. This scanning habit likely corresponded to the reading direction of our Italian sample of participants. Given such correspondence, the results of our study cannot be considered as determinant to establish whether the direction of scanning of the visual scene in our task is driven by cultural or innate factors. Further research is needed to answer such an issue possibly investigating our attentional bias on a sample of participants from cultures where the dominant writing system is from rightto-left (e.g. Arabic, Hebrew).
How can we reconcile our finding showing that the choice in emotion comparison is purely driven by low-level components of attention beyond high-level components?
The discovery of the semantic congruity effect 8 has sparked a theoretical debate regarding the nature of the comparison process that produces the observed choice pattern in comparison tasks. Specifically, the debate focuses on whether the process it is driven by high-level or low-level cognitive processes. The standard interpretation of the pattern of choice observed in comparison tasks has been conceived by many researchers 8,[12][13][14][15][16] as the by-product of the congruity between different types of properties of the stimulus (e.g. quantity, lightness, spatial position, size, emotion, etc.) and their high-level semantic representations (i.e. the one defined by the task or the implicit magnitude representation of the corresponding stimulus properties). The standard approach to interpret these congruity effects assumes an explicit scheme of knowledge about the association between stimulus properties and its implicit magnitude representations. Remarkable examples of this approach have been applied to the representation of numbers (e.g. Ref. 12 ), spatial position (e.g. Ref. 18 ), size (e.g. picture/words of animals 13 ), age (e.g. Ref. 62 ), probabilities of events (e.g. Ref. 63 ), skin color 61 , sound level pressure of acoustic stimuli 64 , temperature 16 , brightness 17,65 , height, depth, size, and width 66 .
However, our results, together with the general framework operationalized by the integrated SIA model, provide a general way to interpret the results of comparison tasks, regardless of high-level semantic knowledge. This is based on cognitive mechanisms involving low-level attentional factors alone. Such mechanisms require a basic cognitive system at work. This idea is consistent with recent research in which the same pattern of results is found in pre-linguistic toddlers 23,65 and animals like monkeys 14,67 . SIA can be used to interpret the pattern of choice in comparison tasks regardless of the type of instruction being used, and any representation of stimulus intensity along a unidimensional magnitude continuum. The way AAF and PAF are modelled by SIA suggests that the choice of emotion, even in an abstract situation as the one characterized by the simultaneous comparison of two facial expressions, cannot be labelled univocally in favor of one specific emotional valence. This applies to neither positive emotions, known to produce a happiness advantage, nor to negative emotions, known to produce an anger advantage. Our results indeed demonstrate that the advantage of one emotional valence over the other depends on stimulus-driven attentional factors that are likely to depend on the perceptual salience of emotions (as extracted from perceptual cues associated with facial expression) and the temporal dynamic of gazing behavior (as prioritizing one hemifield over the other in different temporal phases). In particular, the specific temporal phase of the gazing (whether at the beginning or at the end of the exploration) can turn a happiness advantage into an anger advantage (and vice-versa), making the labelling of the happiness or the anger advantage valuable only on relativistic (not absolute) terms. Future studies could investigate the generalizability of the AAF driven by scanning habits, as well as extend our study of emotion comparison tasks to test how fixation and dwell time patterns may be affected by other spatial orientations of the emotional pairs than the horizontal one. For example, studies could explore how the vertical orientation of faces arranged one above the other might impact the observed patterns.
To conclude, our results show that the role played by high-level semantic and linguistic cognitive factors in patterns of data generally interpreted as the result of comparative judgements can be accounted for by the incidental combination of purely stimulus-driven, exogenous perceptual (PAF) and purely stimulus-independent, endogenous automatic attentional factors (AAF). These two factors may constitute the evolutionary bricks of the well-known semantic congruity effect. The way these two factors may be operationally combined in order to be predictive about different aspects of the choice in comparative judgments (i.e., the speed, the accuracy as well as the latency of choice) as modelled by the SIA, may be used to instruct object classification algorithms to make their discrimination capabilities more efficient and faster in situations involving the simultaneous comparison of multiple options. This could be beneficial in situations where fast and accurate decision-making is necessary, such as in medical diagnosis or security screening.

Data availability
All raw data generated in this study are included in this published article as supplemental materials file (Raw_ Data.xlsx). www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.